11 research outputs found
What is the IQ of your data transformation system?
Mapping and translating data across different representations is a crucial problem in information systems. Many formalisms and tools are currently used for this purpose, to the point that devel- opers typically face a difficult question: “what is the right tool for my translation task?” In this paper, we introduce several techniques that contribute to answer this question. Among these, a fairly gen- eral definition of a data transformation system, a new and very effi- cient similarity measure to evaluate the outputs produced by such a system, and a metric to estimate user efforts. Based on these tech- niques, we are able to compare a wide range of systems on many translation tasks, to gain interesting insights about their effective- ness, and, ultimately, about their “intelligence”
Quality of mappings for data exchange applications
Research has investigated mappings among data sources under two perspectives.
On one side, there are studies of practical tools for schema mapping generation;
these focus on algorithms to generate mappings based on visual specifications
provided by users. On the other side, we have theoretical researches about data
exchange. These study how to generate a solution -- i.e., a target instance -- given a
set of mappings usually specified as tuple generating dependencies.
However, these two research lines have progressed in a rather independent way
and we are still far away from having a complete understanding of the properties
that a "good" schema mapping system should have; to give an example, there are
many possible solutions for a data exchange problem.
In fact, there is no consensus yet on a notion of quality for schema mappings. In this
thesis, based on concepts provided by schema mapping and data exchange
research, we aim at investigate such a notion.
Our goal is to identify a fairly general formal context that incorporates the different
mapping-generation systems proposed in the literature, and to develop algorithms,
tools and methods for characterizing the quality of mappings generated by those
systems
ATOM: Automatic Target-driven Ontology Merging
Abstract — The proliferation of ontologies and taxonomies in many domains increasingly demands the integration of multiple such ontologies to provide a unified view on them. We demonstrate a new automatic approach to merge large taxonomies such as product catalogs or web directories. Our approach is based on an equivalence matching between a source and target taxonomy to merge them. It is target-driven, i.e. it preserves the structure of the target taxonomy as much as possible. Further, we show how the approach can utilize additional relationships between source and target concepts to semantically improve the merge result. I
Core Schema Mappings
Research has investigated mappings among data sources under two perspectives. On one side, there are studies of practical tools for schema mapping generation; these focus on algorithms to generate mappings based on visual specifications provided by users. On the other side, we have theoretical researches about data exchange. These study how to generate a solution – i.e., a target instance – given a set of mappings usually specified as tuple generating dependencies. However, despite the fact that the notion of a core of a data exchange solution has been formally identified as an optimal solution, there are yet no mapping systems that support core computations. In this paper we introduce several new algorithms that contribute to bridge the gap between the practice of mapping generation and the theory of data exchange. We show how, given a mapping scenario, it is possible to generate an executable script that computes core solutions for the corresponding data exchange problem. The algorithms have been implemented and tested using common runtime engines to show that they guarantee very good performances, orders of magnitudes better than those of known algorithms that compute the core as a post-processing step
Concise and Expressive Mappings with +Spicy
We introduce the +Spicy mapping system. The system is based on a number of novel algorithms that contribute to increase the quality and expressiveness of mappings. +Spicy integrates the computation of core solutions in the mapping generation process in a highly efficient way, based on a natural rewriting of the given mappings. This allows for an efficient implementation of core computations using common runtime languages like SQL or XQuery and guarantees very good performances, orders of magnitude better than those of previous algorithms. The rewriting algorithm can be applied both to mappings generated by the system, or to pre-defined mappings provided as part of the input. To do this, the system was enriched with a set of expressive primitives, so that +Spicy is the first mapping system that brings together a sophisticate and expressive mapping generation algorithm with an efficient strategy to compute core solutions. 1
Core schema mappings: Scalable core computations in data exchange
Research has investigated mappings among data sources under two perspectives. On one side, there are studies of practical tools for schema mapping generation; these focus on algorithms to generate mappings based on visual specifications provided by users. On the other side, we have theoretical researches about data exchange. These study how to generate a solution – i.e., a target instance – given a set of mappings usually specified as tuple generating dependencies. Since the notion of a core solution has been formally identified as an optimal solution, it is very important to efficiently support core computations in mapping systems. In this paper we introduce several new algorithms that contribute to bridge the gap between the practice of mapping generation and the theory of data exchange. We show how, given a mapping scenario, it is possible to generate an executable script that computes core solutions for the corresponding data exchange problem. The algorithms have been implemented and tested using common runtime engines to show that they guarantee very good performances, orders of magnitudes better than those of known algorithms that compute the core as a post-processing step.
The Spicy system: Towards a Notion of Mapping Quality
We introduce the Spicy system, a novel approach to the problem of automatically selecting the best mappings among two data sources. Known schema mapping algorithms rely on value correspondences -- i.e. correspondences among semantically related attributes -- to produce complex transformations among data sources. Spicy brings together schema matching and mapping generation tools to further automate this process. A key observation, here, is that the quality of the mappings is strongly influenced by the quality of the input correspondences. To address this problem, Spicy adopts a three-layer architecture, in which a schema matching module is used to provide input to a mapping generation module. Then, a third module, the mapping verification module, is used to check candidate mappings and choose the ones that represent better transformations of the source into the target. At the core of the system stands a new technique for comparing the structure and actual content of trees, called structural analysis. Experimental results show that our mapping discovery algorithm achieves both good scalability and high precision in mapping selection